feat(ci): enforce reference.conf CI check#6795
Conversation
|
Solid CI gate, with two small items to address in this PR plus a recommendation on how to land the broader gate strategy. Two issues to address in this PR1.
|
| Gate | Goal | Catches | Implementation |
|---|---|---|---|
| 1 (this commit) | Naming + depth | Foo, some_key, depth > 5 |
Python + pyhocon |
| 2 | Bean ↔ key parity + default drift | key has no bean field, field has no key, Java default ≠ reference.conf default | Python + javalang or Gradle JUnit using Class.getDeclaredFields() + ConfigFactory.defaultReference() |
| 3 | Binding chain completeness | bean field exists but applyXxxConfig doesn't assign to PARAMETER, or no production consumer reads getXxx() |
Java AST or Gradle JUnit reflection |
| 4 | Inline comment coverage | Leaf keys without #/// documentation |
Python + pyhocon + raw text scan (carry-forward inline comments) |
| 5 | XxxConfigTest coverage |
Bean field present but no test assertion references it | Gradle JUnit (reflection on test bytecode) |
(Gate 6 — deprecation policy via git history scan — is qualitatively different and lower priority; defer.)
Why one PR + multi-commit rather than 5 follow-up PRs:
- Shared infrastructure — Gates 1/2/4 all use pyhocon, share GHA annotation emission, live in
.github/scripts/. Keeping them together avoids cross-PR refactoring. - Coherent design surface — reviewer can spot rules that interact (e.g. comment-coverage gate's line-attribution must match how naming gate walks segments).
- Avoids the "follow-up never happens" trap — multi-PR roadmaps in OSS projects often stall after Gate 1 lands.
- Independent rollback preserved —
git revert <gate-N-commit-sha>is as clean as reverting a standalone PR. - Project velocity context — the config series (refactor(config): simplify applyEventConfig (follow-up to #6615) #6735, fix(config): optimizes config binding for node #6755, refactor(config): merge test config files #6788, refactor(config): overhaul config docs, fix defaults, remove dead params #6790, refactor(config): remove unused storage index and json parsing config #6794, this PR) is concentrated in a few contributors; one focused PR avoids ×5 scoping/CI/review overhead.
Why Gate 2 is the highest-leverage follow-up: it's exactly the gate that would have caught the recent class of issues this series has been fixing manually:
- refactor(config): remove unused storage index and json parsing config #6794 removed
receiveTcpMinDataLength,maxNestingDepth,maxTokenCount,storage.index.directory/switch— all dead config keys discovered by manualgrep. Parity gate would have flagged them. - fix(config): optimizes config binding for node #6755 / refactor(config): overhaul config docs, fix defaults, remove dead params #6790 needed to fix several
allowShieldedTransactionApi/ dns defaults inconsistencies between Java field initializers andreference.conf. Drift gate would have caught them.
Suggested commit layout:
1. feat(ci): add reference.conf naming + depth gate ← current PR content
2. feat(ci): add bean ↔ key parity + default drift gate ← Gate 2 (highest ROI)
3. feat(ci): add config binding chain completeness gate
4. feat(ci): add inline comment coverage gate
5. feat(ci): add XxxConfigTest field coverage gate
For Gate 2 implementation, Gradle JUnit using Java reflection is likely simpler than Python + AST parsing — Class.getDeclaredFields() walks bean fields naturally, no Java AST parser needed. ~4–6 hr starting cost.
TL;DR
- ✅ Approve in principle — Gate 1 is solid, regex/depth logic verified empirically (296 keys, max depth 5)
- 🟠 Fix 2 small items before merge — add
shutdown.BlockTime/Height/Countto ALLOWLIST; tighten regex description in script - 📋 Suggest extending to all 5 gates in this PR as separate commits, rather than 5 follow-up PRs — shared infrastructure, coherent design surface, avoids the "follow-up never happens" trap
|
Two changes have been implemented. Gates 3/4/5 overlap with Gate 2.Gate 2's scope must be clarified first before we can determine whether 3/4/5 still stand independently |
|
Good gate — the naming-convention and depth ceiling cover two of the most common drift vectors. A few additional constraints worth considering for this script or follow-up PRs, roughly in order of implementation cost: 1. Duplicate-key detection HOCON silently keeps the last value when the same key appears twice, making typo-overwrites invisible. A text-level scan (before import collections
seen = collections.Counter()
for line in path.read_text().splitlines():
m = re.match(r'^\s*([\w.]+)\s*[={]', line)
if m:
seen[m.group(1)] += 1
duplicates = [k for k, c in seen.items() if c > 1]2. Port-uniqueness check reference.conf currently defines 10+ default ports. Two services sharing a port is a hard startup failure with a confusing error message. Extracting all *[Pp]ort leaf values and asserting they are distinct costs ~5 lines and 3. No non-empty default for sensitive keys Keys whose path contains password, secret, privateKey, or token should default to "" or be absent. A regex scan over leaf values would catch any accidental weak-credential default before it reaches a release tag. 4. Numeric-range sanity Many leaf types have well-known valid ranges implied by their names: These can be checked with a small name→constraint dispatch table using isinstance(value, int) after pyhocon parsing. 5. Bean-field ↔ key parity (the "Gate 2" mentioned in the PR description) This is the highest-leverage follow-up: reflect each *Config class to enumerate declared fields and cross-check against reference.conf keys in both directions — orphan keys (in conf, no field) and unbound fields (field |
|
Thanks for the thorough list. Status by item: 1 Duplicate-key detection — declined. Duplicate keys in a well-formatted reference.conf are visible in code review (the file has section ordering, and the header documents it). pyhocon exposes no API for it, so the only path is a hand-rolled text scanner — exactly the trap this PR paid for once already: the original homemade scanner mis-handled 2 Port-uniqueness check — implemented in this PR. Added as rule 4 in 3 Sensitive-default scan — not a fit for CI. This is policy-shaped ("what counts as sensitive", "warn vs fail", "where the empty-placeholder allowlist lives"), not a config-grammar check. The decisions are release-engineering and business, not syntactic. CI gates work when the rule is deterministic from the file alone; this one isn't, and pushing it into CI just moves the policy decision into a regex with an ever-growing exemption list. 4 Numeric-range sanity — not a fit for CI either. Range constraints belong with the bean that owns the value (e.g. |
|
For Gate2
With those merged, the follow-up has a smaller, cleaner scope. For this PR: keep it scoped to Gate 1 (naming + depth + port-uniqueness) |
Adds a Python-based CI gate (.github/scripts/check_reference_conf.py)
that parses common/src/main/resources/reference.conf via pyhocon and
enforces two rules on every key path:
1. Each segment must start with a lowercase ASCII letter and contain
only ASCII letters/digits. Acronym runs after position 1 (e.g.
httpPBFTEnable) are accepted: this matches what java.beans.Introspector
tolerates, which is what ConfigBeanFactory actually requires.
2. Maximum nesting depth of 5 segments.
ALLOWLIST contains documented PascalCase exceptions handled via manual
reads in *Config.fromConfig: node.http.PBFTEnable / PBFTPort,
node.rpc.PBFTEnable / PBFTPort, and node.shutdown.BlockTime / BlockHeight /
BlockCount (the last three are currently commented out in reference.conf;
allowlisted to prevent CI breakage if ever uncommented with defaults).
pyhocon version is pinned in .github/scripts/requirements.txt; pip cache
is enabled in pr-check.yml.
Without a CI gate, non-conforming keys silently fail to bind under
ConfigBeanFactory and accumulate. Spring Boot / Quarkus / Micronaut
tolerate kebab-case / snake_case / UPPER_CASE via case normalization;
Typesafe Config's ConfigBeanFactory does not, so the stricter gate is
the right fit for java-tron.
Add rule 4 to the reference.conf gate: leaves whose last segment is `port` or ends in `Port` must bind distinct integer values across the file. Sentinel values 0 and -1 are exempt (disabled / auto-unset). Paths containing `[]` are excluded because list-element ports belong to per-record fields (e.g. genesis witnesses), not to the local process. The check reuses the already-parsed pyhocon ConfigTree — no additional parsing, no new dependencies. Violations are surfaced via the existing consolidated GHA `::error` annotation alongside format/depth violations.
f1747ac to
d7ccca0
Compare
|
[Should] port uniqueness check is undocumented in the PR description The PR description lists three rules (lowerCamelCase, [Discussion] — local fixture suite isn't committed The PR mentions a "Local fixture suite (not committed) covering: empty / simple / dotted-form / at-max-depth / object array / scalar array / allowlist / format violations / depth violations / combined + dedup invariant / env errors." Could we commit these as |
|
Thanks for both items. 1 — Port-uniqueness rule now in the description. Promoted port-uniqueness to a top-level rule in the PR body (alongside lowerCamelCase, depth ≤ 5, and ALLOWLIST). The fixture list in the testing section also expanded to include port-collision cases (same value / sentinel-exempt / array-path-exempt). 2 — Committed test suite: declined for this PR, with reasoning. The ask is reasonable in principle, but committing one here would set a precedent the project hasn't decided on. Current state of CI scaffolding: This PR's script is the only file under
The second decision deserves its own discussion rather than landing implicitly here. Reasons not to bundle:
Happy to revisit as a follow-up PR if the project wants to formalize "CI scripts under test" as a convention. For now the gate's regression coverage is the documented fixture suite + the gate itself running on every PR, which makes any silent regression visible the next time |
What does this PR do?
Adds a CI gate that validates
common/src/main/resources/reference.confon every PR / push, enforcing four rules:^[a-z][a-zA-Z0-9]*$: starts with a lowercase ASCII letter, then ASCII letters/digits only. Acronym runs after position 1 (e.g.httpPBFTEnable,openHistoryQueryWhenLiteFN,allowShieldedTRC20Transaction) are accepted — only the first character is constrained, which matches whatjava.beans.Introspector/ConfigBeanFactoryactually require for bean-property auto-binding.MAX_DEPTHvia a reviewed change). Array steps count toward depth (each[]= +1 level); nested arrays (HOCON list-of-list) are supported.portor ends inPortmust bind distinct integer values. Sentinels0(disabled) and-1(auto/unset) are exempt. Paths containing[]are excluded — per-record fields (e.g. genesis-witness ports), not local-process bindings. Added in response to review feedback to prevent silent port-collision regressions.node.shutdown.*).Implementation:
.github/scripts/check_reference_conf.pyusingpyhocon(pinned via.github/scripts/requirements.txt), wired into the existingcheckstylejob in.github/workflows/pr-check.yml.actions/setup-pythonenables pip caching keyed on the requirements file. Violations produce per-line stdout output plus one consolidated::error file=...,title=reference.conf::...GHA annotation so failures show up in the PR check summary.Why are these changes required?
reference.confis the single source of default values for every config key in java-tron, and key names must auto-bind toXxxConfig.javabean fields via Typesafe Config'sConfigBeanFactory. This gate is stricter than typical Java/Spring projects (Spring Boot'sBinder/ Quarkus / Micronaut tolerate kebab-case, snake_case, UPPER_CASE, and camelCase as equivalent forms) becauseConfigBeanFactoryusesjava.beans.Introspectorand requires the HOCON key to match the JavaBean property name exactly — no case normalization. Without a CI gate, non-conforming keys silently fail to bind and accumulate.A depth ceiling also prevents deeply nested hierarchies from creeping in. The gate is set exactly at the current max with no buffer: silent drift is unacceptable for a mature project, so any new key that needs to go deeper has to bump
MAX_DEPTHexplicitly and be reviewed.Service-port collisions are a hard startup failure with a confusing error message at runtime; the uniqueness check surfaces them at PR time instead.
This PR is the
release_v4.8.2counterpart of #6792 (which targetsdevelop), so the same gate runs on hotfix PRs against the release branch.This PR has been tested by:
release_v4.8.2'sreference.conf— passes with 296 keys, max depth 5, all service ports unique.::errorannotation; a single user-declared deep key produces exactly one depth violation line (leaf-only filtering).Follow up
config.conffiles).*ConfigTest, given that refactor(config,protocol,ci): optimize config binding #6762'snormalizeNonStandardKeys()already provides the runtime equivalent (non-conformant new keys now fail at process startup with a precise config path). Should be a separate PR with refactor(config): remove unused storage index and json parsing config #6794 (dead-key cleanup) and refactor(config): overhaul config docs, fix defaults, remove dead params #6790 (default-value fixes + remove human-readable size parsing) as hard prerequisites.Extra details
config.confcompatible:node.http.PBFTEnable,node.http.PBFTPort,node.rpc.PBFTEnable,node.rpc.PBFTPort, plusnode.shutdown.BlockTime/BlockHeight/BlockCount(PascalCase exceptions handled manually inNodeConfig.fromConfig; currently commented out inreference.conf— pre-listed so the gate stays green if ever uncommented with defaults).pyhoconover Typesafe Config because the gate runs before Gradle/JDK setup in CI (~3 steps: setup-python + pip install + invoke). Both implement the HOCON spec; outputs are equivalent.